AITopics | regret minimization

Collaborating Authors

regret minimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dynamic Regret Reduces to Kernelized Static Regret

Neural Information Processing SystemsJun-23-2026, 04:09:46 GMT

We study dynamic regret in online convex optimization, where the objective is to achieve low cumulative loss relative to an arbitrary benchmark sequence. By observing that competing with an arbitrary sequence of comparators u1,...,uT in W Rd can be reframed as competing with a fixed comparator function u: [1,T] W, we cast dynamic regret minimization as a static regret problem in a function space. By carefully constructing a suitable function space in the form of a Reproducing Kernel Hilbert Space (RKHS), our reduction enables us to recover the optimal RT(u1,...,uT) = O( pP t ut ut 1 T) dynamic regret guarantee in the setting of linear losses, and yields new scale-free and directionallyadaptive dynamic regret guarantees. Moreover, unlike prior dynamic-to-static reductions--which are valid only for linear losses--our reduction holds for any sequence of losses, allowing us to recover O u 2H +deff(λ)lnT bounds when the losses have meaningful curvature, where deff(λ)is a measure of complexity of the RKHS. Despite working in an infinite-dimensional space, the resulting reduction leads to algorithms that are computable in practice, due to the reproducing property of RKHSs.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Faithful Dynamic Imitation Learning from Human Intervention with Dynamic Regret Minimization

Neural Information Processing SystemsJun-22-2026, 14:00:49 GMT

Human-in-the-loop (HIL) imitation learning enables agents to learn complex behaviors safely through real-time human intervention. However, existing methods struggle to efficiently leverage agent-generated data due to dynamically evolving trajectory distributions and imperfections caused by human intervention delays, often failing to faithfully imitate the human expert policy. In this work, we propose Faithful Dynamic Imitation Learning (FaithDaIL) to address these challenges. We formulate learning from human intervention as an online non-convex problem and employ dynamic regret minimization to adapt to the shifting data distribution and track high-quality policy trajectories. To ensure faithful imitation of human expert despite training on mixed agent and human data, we introduce an unbiased imitation objective and achieve it by weighting the behavior distribution relative to the human expert's as a proxy reward. Extensive experiments on MetaDrive and CARLA driving benchmarks demonstrate that FaithDaIL achieves state-ofthe-art performance in safety and task success with significantly reduced human intervention data compared to prior HIL baselines.

intervention, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.93)
Transportation > Ground > Road (0.68)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Design-Based Bandits Under Network Interference: Trade-Off Between Regret and Statistical Inference

Neural Information Processing SystemsJun-21-2026, 17:25:55 GMT

In multi-armed bandits with network interference (MABNI), the action taken by one node can influence the rewards of others, creating complex interdependence. While existing research on MABNI largely concentrates on minimizing regret, it often overlooks the crucial concern that an excessive emphasis on the optimal arm can undermine the inference accuracy for sub-optimal arms. Although initial efforts have been made to address this trade-off in single-unit scenarios, these challenges have become more pronounced in the context of MABNI. In this paper, we establish, for the first time, a theoretical Pareto frontier characterizing the trade-off between regret minimization and inference accuracy in adversarial (design-based) MABNI. We further introduce an anytime-valid asymptotic confidence sequence along with a corresponding algorithm, EXP3-N-CS, specifically designed to balance the trade-off between regret minimization and inference accuracy in this setting.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

580760fb5def6e2ca8eaf601236d5b08-Supplemental.pdf

Neural Information Processing SystemsApr-26-2026, 00:42:33 GMT

machine learning, mechanism, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (0.67)

Genre: Workflow (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

580760fb5def6e2ca8eaf601236d5b08-Paper.pdf

Neural Information Processing SystemsApr-26-2026, 00:42:18 GMT

data mining, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Neural Information Processing SystemsFeb-15-2026, 00:24:32 GMT

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Texas (0.04)
(5 more...)

Genre: Overview (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

Neural Information Processing SystemsFeb-12-2026, 14:16:46 GMT

That is, to minimize simple regret. However, this objective remains understudied.

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.69)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.69)

Add feedback

d0d5dd7bd2ee9f095e50084c2ba3a716-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 01:36:53 GMT

algorithm, imitation, learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre:

Instructional Material > Online (0.50)
Research Report (0.46)

Industry: Education > Educational Setting > Online (0.49)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Near-optimal Swap Regret Minimization for Convex Losses

Hu, Lunjia, Schneider, Jon, Wu, Yifan

arXiv.org Machine LearningFeb-10-2026

We give a randomized online algorithm that guarantees near-optimal $\widetilde O(\sqrt T)$ expected swap regret against any sequence of $T$ adaptively chosen Lipschitz convex losses on the unit interval. This improves the previous best bound of $\widetilde O(T^{2/3})$ and answers an open question of Fishelson et al. [2025b]. In addition, our algorithm is efficient: it runs in $\mathsf{poly}(T)$ time. A key technical idea we develop to obtain this result is to discretize the unit interval into bins at multiple scales of granularity and simultaneously use all scales to make randomized predictions, which we call multi-scale binning and may be of independent interest. A direct corollary of our result is an efficient online algorithm for minimizing the calibration error for general elicitable properties. This result does not require the Lipschitzness assumption of the identification function needed in prior work, making it applicable to median calibration, for which we achieve the first $\widetilde O(\sqrt T)$ calibration error guarantee.

artificial intelligence, machine learning, swap regret, (18 more...)

arXiv.org Machine Learning

2602.08862

Country: